A comparative analysis of retrieval features used in the TREC 2006 Genomics Track passage retrieval task

نویسندگان

  • Hari Krishna Rekapalli
  • Aaron M. Cohen
  • William R. Hersh
چکیده

OBJECTIVE Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP). METHODS A multivariate regression model was built using a backward-elimination approach as a function of certain generalized features that were common to all the algorithms used by TREC 2006 Genomics track participants. RESULTS Our regression analysis found that the following four factors were collectively associated with variation in MAPP: (1) Normalization of keywords in the query (2) Use of Entrez gene thesaurus for synonymous terms look-up (3) Unit of text retrieved using respective IR algorithms and (4) The way a passage was defined. CONCLUSION These reasonably likely hypotheses, generated by an exploratory data analysis, are informative in understanding results of the TREC 2006 Genomics passage extraction task. This approach has general value for analyzing the results of similar common challenge tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IIT TREC 2006: Genomics Track

For the TREC-2006 Genomics Track, we report on the effectiveness of composite information retrieval functions based on a dimensional data model for improving document, passage, and aspect search precision of genomics literature. We designed an approach, and developed a corresponding search engine, based on a novel dimensional data model capable of document, paragraph, sentence, and passage leve...

متن کامل

UB at TREC Genomics 2006: Using Passage Retrieval and Pre-Retrieval Query Expansion for Genomics IR

This paper presents the results of the University at Buffalo (UB) in TREC genomics. For this task we used the SMART retrieval system and a pre retrieval expansion method that uses the ABGene and MetaMap tools. We tried two different weighting schemes one using pivoted length normalization (Lnu.ltu) and another using augmented tf-idf (atn.ann). The results show that performance of pivoted length...

متن کامل

York University at TREC 2006: Genomics Track

Our Genomics experiments mainly focus on addressing four problems in biomedical information retrieval. The four problems are: (1) how to deal with synonyms? (2) how to deal with the frequent use of acronyms? (3) how to deal with homonyms? (4) how to deal with the document-level retrieval, passagelevel retrieval and aspect-level retrieval? In particular, we use the automatic query expansion algo...

متن کامل

I2R at TREC 2006 Genomics Track

This paper describes the method we used for the Genomics Track of TREC 2006. BM25 model is implemented to retrieve relevant documents. We also tried to re-ranking documents based on the initial retrieval before passage retrieval. Passages are retrieved based on the concepts defining in topics and concept coverage. Results of submitted runs are listed and discussed.

متن کامل

Enhancing access to the Bibliome: the TREC 2004 Genomics Track

BACKGROUND The goal of the TREC Genomics Track is to improve information retrieval in the area of genomics by creating test collections that will allow researchers to improve and better understand failures of their systems. The 2004 track included an ad hoc retrieval task, simulating use of a search engine to obtain documents about biomedical topics. This paper describes the Genomics Track of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • AMIA ... Annual Symposium proceedings. AMIA Symposium

دوره   شماره 

صفحات  -

تاریخ انتشار 2007